[% setvar title C<use unicode::representation> and C<no unicode> %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='TITLE'></a><h1>TITLE</h1>
<p><code>use unicode::representation</code> and <code>no unicode</code></p>
<a name='VERSION'></a><h1>VERSION</h1>
<pre>  Maintainer: Simon Cozens &lt;<a href='mailto:simon@brecon.co.uk'>simon@brecon.co.uk</a>&gt;
  Date: 25 Sep 2000
  Mailing List: <a href='mailto:perl6-internals@perl.org'>perl6-internals@perl.org</a>
  Number: 300
  Version: 1
  Status: Developing</pre>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>Perl 5.6's <code>use bytes</code> is a useful pragma; this RFC stipulates its
intended behaviour in Perl 6 <b>right now</b>.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>When Perl 5.6 introduced Unicode support, there suddenly became two ways
of handling data: you can handle it as a series of Unicode characters,
or you can manipulate it byte-for-byte. The <code>use bytes</code> pragma was used
to force byte-for-byte manipulation.</p>
<p>The problem with this was mainly of understanding; you had to know when
Perl was handling your data as UTF8 before <code>use bytes</code> made any sense,
and then it looked like you were letting people fiddle with the internal
representation of data, something that should be hidden from them.</p>
<p>This was only <b>really</b> a problem because some data was encoded and some
wasn't. If we're using a consistent Unicode representation internally,
and that's known and documented, the root of the problem goes away;
since everything's going to be converted to UTF8 when it enters Perl, it
makes sense to want to get at that UTF8. It's considerably less messy
now.</p>
<p>But why? Why would someone want to get at the UTF8? One reason, which I
believe to be sufficient enough to justify the pragma, noted in
&quot;Normalisation and <code>unicode::exact</code>&quot; was to compare representations
without decomposition. <code>unicode::exact</code> gives you non-destructive
comparisons, but it doesn't give you byte-for-byte comparisons.
<code>use bytes</code> would do this. Other uses which have been noted on p5p
recently include XML and sending data over the network.</p>
<p>However, I want to keep all the Unicode-related pragmata consistently
named, so I suggest that <code>use bytes</code> be renamed to
<code>use unicode::representation</code>; <code>use bytes</code> appeared to be a rather
confusing name anyway.</p>
<p>As part of the &quot;What does use bytes mean?&quot; discussion/debacle/holy war,
Nick Ing-Simmons expressed the desire for a new pragma which turned off
Unicode altogether. Let's put that in, and call it <code>no unicode</code>.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<p>The exact semantics of <code>no unicode</code> will be trashed out in &quot;Unicode
Combinatorix&quot;; <code>use unicode::representation</code> can be implemented
similarly to <code>use bytes</code>.</p>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p><i><a href='http://search.cpan.org/perldoc?bytes'>bytes</a></i></p>
<p>the as-yet-unsubmitted RFCs on</p>
<p>RFC 295: Normalisation and <code>unicode::exact</code></p>
<p>RFC ??: When UTF8 Leaks Out</p>
<p>RFC 312: Unicode Combinatorix</p>
<p>RFC 311: Line Disciplines</p>
</div>
